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Abstract 

We present a simulation methodology for Bayesian estimation 
of rate parameters in Markov jump processes arising for example 
in stochastic kinetic models. To handle the problem of missing 
components and measurement errors in observed data, we embed 
the Markov jump process into the framework of a general state 
space model. We do not use diffusion approximations. Markov 
chain Monte Carlo and particle filter type algorithms are introduced, 
which allow sampling from the posterior distribution of the rate pa- 
rameters and the Markov jump process also in data-poor scenarios. 
The algorithms are illustrated by applying them to rate estimation 
in a model for prokaryotic auto-regulation and in the stochastic 
Oregonator, respectively. 
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1 Introduction 



It is generally accepted that many important intracellular processes, e.g. gene 
transcription and translation, are intrinsically stochastic, because chemical re- 
actions occur at discrete tim es a s results fro n i rand om molecular collisions 
( McAdams and Arkin (| 19971 ) and lArkin et al.l (jl998l l'). These stochastic ki- 
netic models correspond to a Markov jump proce ss and can thus be simulated 
using techniques such as the Gillespie algorithm (IGillespid (119771)) or - in the 
time-inhomogeneous case - Lewis' thinning method (jOgatal (|l98ll )). Many of 



1 



the parameters in such models are uncertain or unknown, therefore one wants 
to estimate them from times series data. One possible approach is to approx- 
imate the model with a diffusion and then to perfor m Bayesian (static or se- 
quent ia l) inference based on the appro xi mation (see iGolightlv and Wilkinson 
(I2n0fil). iGolightlv and Wilkinson! jloO^), lOolightlv and WilkinsonI ^200^ ) and 
GoUghtlv and WilkinsonI toO^)). T his giv es more flexibility to generate the 
proposals (see Durham and Gallant ( 20021 )). but it is difficult to to quantify 
the approximation error. Depending on the application, it might be preferable 
to work wi t h the original Markov j ump proces s . Thi s possibility is mentioned in 
WilkinsonI (|2006l ). cahpter 10, and lBovs et al.l (I2OO8I ) demonstrate in the case of 
the simple Lotka-Volterra model that this approach is feasible in principle, but 
in more complex situations it is difficult to construct a Markov chain Monte 
Carlo (MCMC) sampler with good mixing properties. The key problems in 
our view are to construct good proposals for the latent process on an interval 
when the values at the two end points are fixed and the process is close to the 
boundary of the state space, and to construct reasonable starting values for the 
process and the parameters, in particular when some of the components are 
observed with small or z ero noise. We pro pose here solution s for both o f these 
problems that go beyond Wilkinson ( 20061 ). Chapter 10, and Bovs et al. ( 20081 ) 
and thus substantially enlarge the class of models that are computationally 
tractable. 

The rest of the paper is organized as follows. In Section [21 we describe the 
model, establish the relation to stochastic kinetics and introduce useful notation 
and densities. In Section [3l we motivate the Bayesian approach and present the 
base frame of the MCMC algorithm. Section[3]describes in detail certain aspects 
of the algorithm, mainly the construction of proposals for the latent Markov 
jump process. In Section [5l the particle filter type algorithm to initialize values 
for the parameters and for the latent Markov jump process is presented. In 
Section [6 we look at two examples. First, the stochastic Oregonator (see 



GillesDiel (|l977l )) is treated in various scenarios, including some data-poor ones, 
to show how the algorithm works. Then, we turn to a. mod el for prokaryotic 
au to-regulation introduced i n ICohghtlv and WilkinsonI (|2005l ) and reconsidered 
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Golightlv and WilkinsonI (|2009l ). Finally, conclusions are given in Section [7| 



2 Setting and definitions 
2.1 Model 

Consider a Markov jump process y = {yt = {yj,. . . ,yf)'^ : t > to} on a state 
space £^ C Nq with jump vectors Ai £ IF for i G {1, . . . ,r} and possibly time 
dependent transition intensities fj,i{t,y) = 6i • hi{t,y): 

F[yt+5 = y + Ai\yt = y] = fi,{t, y)5 + 0(6) {6 > 0). 
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We denote the total transition intensity by 

r 
i=l 

We assume that the functions h = {/ii}ig{i,2,...,r}) called the standardized transi- 
tion intensities, the jump matrix A with columns Ai and the initial distribution 
/o of utg are known. The goal is to estimate the hazard rates 6 = {9i, . . . ,9r) 
from partial measurements xq, xi, . . . , x„ of the process at discrete time points 
= to < ^1 < ■ ■ ■ < ^n- Unobserved components are set as na and we assume 



xi\y 



9r,{-\yti), 



where g^{xi\yti) is a density with respect to some a-finite measure (with possibly 
unknown) nuisance parameter rj. We specify this more precisely in the examples 
in Section [6l 

This framework can be regarded as a general state space model: xq,xi, ... is 
an observed tin ies series which is derived from t he un observable Markov chain 
yt,,,yt-,,. . . fsee lKiinschl (|200d ) or lboucet et al.l (120011 )1. 



For computational reasons, we further assume that we can easily evaluate the 
time-integrated standardized transition intensities 



Hi{s,t,y) := / hi{u,y)du. 
J s 



Models of the above form arise for example in the context of stochastic kinetics. 
Consider a biochemical reaction network with r reactions Ri, . . . , Rr and p 
species Y^, . . . , Y^, i.e., 



Ri : ViiY^ + v,2Y^ + ■■■+ v.pYP 



UiiY^ + uaY'^ • • • + u^pY'P 



for i = l,...,r. Let yl denote the number of species Y^ at time t, yt = 
(yj , . . . , yf V = (vij) and U = (uij). Then, according to the mass action 
law, we can describe {yt : t > Iq} as a Markov jump process with jump matrix 
A = {U — V)'^ and standardized reaction intensities 



For further details, see e.g. lGillesDiel(|l977l ). iGolightlv and WilkinsonI (|2005l l or 
Golightlv and WilkinsonI (j2009l ). We will use in the following terminology from 
this application: We will call the jump times reaction times and classify a jump 
as one of the r possible reaction types. 
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2.2 Additional notation and formulae for densities 



A possible path y[a,b] on an interval [a, b] in our model is uniquely character- 
ized by the total number of reactions ntot, the initial state Ua, the successive 
reaction times a < ti < . . . < Tn^^^ < b and the reaction types (or indices) 
ri, r2, . . . , r^j^t ^ {1; • • • ; '"I- The states at the reaction times are then obtained 
as 



y- 



1=1 



We write for simplicity instead of ■ Furthermore, rl^^ is the total number 
of reactions of type i and rtot is the vector with components r^^j. All these 
quantities depend on the interval [a,b]. If this interval is not clear from the 
context, we write ntot{[a,b]), Tk{[a,b]), etc. 

The density i/jq of y[afi] given is well known, see e.g. Wilkinson ( 20061 ). 
Chapter 10. Defining tq = a, Tntot+i = ^ Ho = Uai it is given by 



ipe{y[a,b]\ya) 



exp 



r r-h \ "tot 

j hi{s,ys)ds • JJi 
i=i / k=i 



hrk{n,yk-i) 



"tot+l r 



exp - ^ ^eiHi{Tk-i,Tk,yk-i) ■ Y\Or^hr^{Tk,yk-i)- 



k=l i=l 



k=l 



In the time- homogeneous case, i.e., hi{t, y) = hi{y), we have Hi{Tk-i,Tk,yk-i) 
hi{yk-i)Sk with 6k = Tk- Tk-i- Therefore 



and 



5k\Tk~i,yk-i ~ Exp(/io(yfc-i)) 
Pin = Ark-i,yk-i\ - 



(1) 



(2) 



fJ-oiVk-i) 

and we ca n exactly simul ate the Markov jump process using the Gillespie algo- 
rithm (see lGillespid (|l977l ')) or some faster versions thereof (see lGibson and Bruck 
( 2OO0I )). Replacing hi{yk-i) by hi{Tk~i,yk-i) in ([I]) and ([2]), this can be done 
"approximately" in the inhomogeneous ca se. An exact simulation algorithm 
based on a thinning method is described in iQgatal (|l98lh . 

We write the density of all observations in [a, b] as 

l,a<ti<b 

where the empty product is interpreted as 1. The joint density of y[tQ,tn] and 
^[to,tn] (given the parameters 6 and r/) is then 



piyito,tn] ' ^ito,tn] v) = Myo) ■ i^e{y[to,t„] \yo) ■ 9vi^ito,tn] \y[to,t„])- 



(3) 
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3 Bayesian approach and Monte Carlo methods 



The maximum likelihood estimator is too complicated to compute because we 
are not able to calculate the marginalisation of the density in ([3]) over y[to,t„] 
explicitly. It seems easier to combine a Bayesian approach with Monte Carlo 
methods, that is we will sample from the posterior distribution of the pa- 
rameters and the underly ing Markov jump process y[to,t„] given the data (see 
Robert and Casellal (|2004l )). This has also the additional advantage that prior 
knowledge about the reaction rates can be used. Assuming 9 and t] to be in- 
dependent a priori, the joint distribution of y[tQ^tn]^ ^[to,tn]i ^ and r] has the 
form 

p(y[to,t„]'^[ioA]'^'^) =p(y[to,in]'^[to,tn]l^'^) •p(^) -pI^)- 



We want to simulate from p{y^f^-^ ^^^, 9 , ijlx^f^ ^^]) , which also yields samples from 
p{6,ri\x[tg^t^]) using a marginalisation over y[to,t„]- The standard approach to 
do this is iterating between blockwise updates of the latent process y[to,t„] 
on sub intervals of [tn,t pj with Me tr opoli s-Hastings ste ps, updates of 9 and 
updates of t? (see e.g. Gilks et al. ( 19961 ). chapter 1, Bovs et al. ( 20081 ) or 
Golightlv and WilkinsonI toO^ )). 

(I2OO8I ). we choose independent Gamma distributions with 



As in iBovs et al 



parameters and /3j as priors for 9i: 



«n^r"'exp(-M) 



1=1 



We write this distribution as Tr{a,f3) where a and j3 are vectors of dimension 
r. Conditionally on y[to,t„]i^[to,tn] aiid 7], the components 9i have then again 
independent Gamma distributions, more precisely 



with 
and 



\yito,t„],x[to,t„],ri ^ o\y[to,t„] ~ [a{yito,t„]),P(.y[to,t„]) 



(4) 



rtn ntot+1 

A(y[to,tn]'ft) = ft + / hi{s,ys)ds = ^ Hi{Tk-i,Tk,yk-i). 

k=l 



Choosing a suitable prior for rj depends heavily on the error distribution, so we 
refer to the examples in Section [6l 

We propose the following algorithm, which will be explained in more detail in 
the next sections. The generation of initial values yjj^^f p G^^^ and rj^^^ will be 
discussed in Section [5l The choice of the set 1^[tQ,tn] of overlapping subintervals 
[a,b] C [to,tn] for updating y will be discussed in Section imi 
Algorithm 3.1 (Simulation from yj^^j^^], 0, 77 given xj^^j^^^]). Form = 1,2, .. . ,M: 
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1. Set = y[;^^, 9 = 9^^-^), rj = Update y^^M for all 
[a,b] £ '^[tctn] sequentially in fixed order by proposing y^^^^ as described 
in sections \4-l\ \4-^ o,nd |^.5| and replacing y\^a,b] by yj^^^ with probability 

a{y[a^]\y[a,b]^^^v) (see mD). Set y^^^^^ = y[to,t„]- 

2. S^mulate ~ • 

3. Generate rj^'^^ given y\Y^\ i in a suitable fashion. 



4 Simulating a path given parameters and observa- 
tions 



We assume now that 9 and r] are fixed and we want to modify y[a^h] on sub 
intervals [a, 6] of [to)*n]- First we consider the case to < a < b < tn where 
the values ya and yb remain unchanged. The boundary cases will be discussed 
in 14.51 Exact methods to simulate from a continuous time Markov chain con- 



ditioned on both endpoints are reviewed and discussed in iHobolt and Stone 
The rejection method is too slow in our examples, and the other two 
require eigendecompositions of the generator matrix. This would require trun- 
cating the state space and is too time-consuming in our examples. Hence we use 
a Metropolis-Hastings procedure. Our proposal distribution q first generates a 
vector of new total reaction numbers r"^^ on [a, b] and then, conditioned on 
^tot"^ generates a value y^^^^y 



4.1 Generating new reaction totals 



Because the values ya and yb are fixed, we must have that 

= yb-ya = Artot ^ A{rzr - not) = 0. (5) 

If rank{A) = r, the reaction totals remain unchanged. Otherwise it is known 
that {x £l7 : A-x = 0} forms a lattice and can be written as {ai-vi+- ■ ■+ad-Vd '■ 
ai, . . . , Orf G Z} with d = dim(ker(j4)) and basis vectors vi £ Z^, / G {1, 2, . . . , d} 
(note that these vectors are not unique). Appendix|A]describes how to compute 
a basis vector matrix 

V{A) = {vi,...,vd). 
This enables us to generate a vector r^^^ which respects ([5]) in a simple way: 

r^r = rtot + V{A).Z, Z ^ qf , (6) 

where qf is a symmetric proposal distribution q^ on Z*^, i.e., q^{z) = qf{—z), 
with parameter i. If r^^^ has a negative component, we stop and set y^^b] ~ 

y[a,b] ■ 
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4.2 Generating a new path given the reaction totals 



The new path y\^a,b] depends only on ya and the new reaction totals rf^t", and 
not on the old path y^afi] ■ The constraint y^^^ = yi, is satisfied automatically by 
our construction of r'^of" ■ Therefore our algorithm simply generates a path on 
[a, b\ with given initial value and given reaction totals, and we can omit the su- 
perscripts new. » generate the path according to r independent 



inhomogeneous Poisson processes with intensities 

h{t) = fii{a,ya)^ — - + f^i{b,yb)^ 



^nei«,i^ In situations where the standardized reaction 
intensities hi depend strongly on y, this proposal often generates paths that 
are impossible under the model. This is typically the case when the number of 
molecules of some species is small. Our proposal first decides the order in which 
the reactions take place, that is we first generate for k = 1,2, ... , ntot- In 
a second step, we generate the reaction times r^, taking into account both the 
probability of a reaction of a given type at the current state of the process and 
the remaining number of reactions of type i after time that still have to 
occur in order to reach the prescribed total. In order to make the description of 
the algorithm easier to read, we mention that is a first guess for t/^-i (needed 
only if the intensities are time inhomogeneous). Also remember that yk = yT^- 
Algorithm 4.1 (Generating y[a,b] given rtot and ya). 

1. Set SI = rl^t /or i e {1, . . . , r} and yo = ya- 

2. For k = 1, . . . , ntot do the following: 

Set tl = a + {b-a){k-l)/ntot- If fJ-iitl^yk-i) = for all I with > 0, 
stop. Otherwise, generate rj. with probabilities 



V[rk = i]^^Sl_^^ii{tl,yk-i). (7) 

Ifrk = i, set SI = Sl_^ - I, S[ = S[_^ for I i and yj, = yt-i + A- 

3. Generate {5k', k E {1, . . . ,ntot + 1}) according to a Dirichlet distribution 
with parameter a = {a^', k G {1, . . . , ntot + 1}) where 

^ _ „ ^ E^/^o^(^^?/^-l) /ox 

and set = Tk-i + (6 - a)5k for k = 1, . . . , ntot- 

The algorithm stops in step 2 when we can no longer reach the state yi, on a 
possible reaction path using the available remaining reactions. This means that 
an impossible path is proposed which has acceptance probability 0. 

The heuristics behind the steps in the above algorithm is the following. The 
probabilities d?]) are an attempt to reach a compromise between the probability 
of a reaction of type i at the current state according to the law of the process 
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and the remaining number of reactions of type i that still have to occur in 
order to reach the prescribed total. Empirically, we found that choosing these 
probabilities proportional to the the geometric mean leads to good acceptance 
rates in the examples in Section O The Dirichlet distribution in Q is used as 
an approximation of the distribution of independent exponential-(/Uo(*fc) yfc-i)) 
waiting times 6^ conditioned on the event that their sum is equal to b — a. If 
all /io(ifc, yfc-i) are equal, the conditional first two moments are 



E 



and 



Var 



I 



(b-a) 



(6 



El E [Si] 



(9) 




and moreover the conditiona l distribution is Dir i chlet w ith parameters = 
1, scaled by b — a, see e.g. Bickel and Doksum ( 19771 ) . Section 1.2. In the 
general case, we use a Dirichlet distribution as approximation and determine 
the parameters such that the expectation matches the right-hand side of ^ 
for all k. This implies that at oc /Iq ^(t^, y^-i)- Finally, the proportionality 
factor is determined such that the sum of the variances matches the sum of the 
right-hand side of ()10p . 



4.3 Acceptance probability of a new path 

By construction, the proposal density q{y^^]j!^\y[a,b]i(^) has the form 

Because of the symmetry of qf , we have 



t I 



new\ 
tot ) 



So it will cancel out in the acceptance probability and we do not need to consider 
it. 

Next, qiy[a,b]\ya,rtot,0) is equal to 

^/si-MtlVk-i) /P- ((Tfc - Tfe_i)/(6 -a):ke{l,..., ntot + 1}) 



n 



]ntot 



k=iEi=i^JsLMthyk^i) ' 

where f^" is the density of the Dirichlet distribution with parameter a from 



Hence, according to the Metropolis-Hastings recipe, the acceptance probability 
is 



, „e«„ n ^ ■ /i My[a7]\y<'^9r,{x[a4y]'^J^)q{y[a,b]\ya,rtot,0) \ 

«(y[a,6] \yia,b] ,0,r]) =mm<l, —— , , , r x . ,„e«;i , ^new n^ ( ' 

[ V9[yiaM\ya)9v\^[aM\yia.M)'l'yyia,b]\y<^^^tot ) 



(11) 
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4.4 Choice of the sub intervals [a, b] 

To ensure that the process can be updated on the whole interval [toi^n], we 
have to choose a suitable set of sub intervals I[to^tn] which we apply the 
above updating algorithms. As a general rule, one can say that they should be 
overlapping. Also it is often useful to include sub intervals which do not lead 
to a change of the process at the observation times ti < t2 < • • • < tn- In such 
situations, the terms 9r](,x^a,b]\y^ah\^ 5'»?(^[a,6] |y[a,fe]) equal and therefore 
cancel out in the acceptance probability. 

In cases where the observations are complete and noise-free, we need only sub- 
intervals of the form However, because it is sometimes a non-trivial 
problem to find a realization of the process which matches all observations, we 
found that it is sometimes useful to include a tiny noise in the model and to 
choose also sub-intervals with a as interior point. By this trick we can often 
obtain realizations that match all observation by the above updating algorithms. 

In general, good choices of the sub-intervals can be very dependent on the given 
situation. The standard one is to let consist of all intervals of the form 

and +tfc)/2, (tfc +tfc+i)/2]. 

4.5 Updating the path at a border 

In the cases b = or a = to we also want to change the values of yt^ and t/tg , 
respectively (unless /o is a Dirac measure). We recommend to propose first a 
change in r'^^^^, that is 

rZ'" = rtot+r', r' ^ ql', , (12) 

where (/[",' is a symmetric distribution on W . Then either ya or remains 
unchanged and the other value follows from yb — ya = ^i^toT- The rest can 
be done again with Algorithm 14.11 If y^^"^ ^ yt^, the factor fo{y'^g^)/ foiuto) 
is needed additionally in the acceptance probability (|lip . In the examples, we 
discuss how to proceed if we want to change only some components of yt^ or 
yt„, respectively. 

5 Initialisation of 77, 9 and 2/[to,t„] 

The form of the trajectories of the underlying Markov jump process depends 
strongly on the parameter 6 and the value at to- So just choosing rj^^^ and 0^^^ 
and then simulating y^^^ ^ j leads usually to processes which match the observed 
data badly. It then takes very many iterations in the algorithm until we obtain 
processes that are compatible with the data. 

In our experience, generating the starting values by algorithm 15. II below leads to 
substantial increases in computational efficiency. It is inspired by the particle 
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filter: We select the most likely particle, perform a number of Metropolis- 
Hastings steps (similarly to lGilks and Berzuinil (j200lh ) and propagate with the 
Gillespie algorithm. 

An additional trick can bring further improvement. Because the speed of the 
techniques described depends heavily on the number of reactions in the system, 
one wants to ensure that the initial value y[to,t„] for Algorithm 13.11 has rather 
too few than too many reactions. We can achieve this with a simple shrinkage 
factor V between and 1 for Q during the initialisation, that is replacing Q after 
simulation with v ■ 9. This acts like a penalisation on the reaction numbers: It 
does not affect the probabilities in ([2]) (the time- homogeneous case) , but makes 
the system slower, resulting in fewer reactions. 
Algorithm 5.1 (Generating starting values). 

1. Choose 7]^^\ 

2. Simulate S^^^ i.i.d starting values yf^ ~ p{yto\xto) (^^^d generate y^f^^^ for 
sG{1,2,...,5'^^J'} using the Gillespie algorithm with the normalized stan- 
dardized reaction intensities I{/j.>o} = 1; • • • o,nd equal hazard rates 
l/(ti -to). Set yf/^l^^j = yj^'^^^^p where s' = argmax,{g^(o)(a;[t„,t^]|y^j^_j^j)}. 
S^^^l^t^ 0{i> ~ r, (a(y«^p,^(yj;|,^,)). 

3. Forl = l,...,n-l: 

a) Use mW steps of algorithm I3.il on [to,ti] with shrinkage factor v 
and starting values V^t^t^ o,nd O^''^ to generate V^t^tJ ^^f^d 6^^^"^^ . 

h) Generate S^^^ paths y[tQti+-^] which are independent continuations of 
y^^lj on {ti,tij^i\, based on the Gillespie algorithm with 9^^^^^ . Set 

'[to,ti+i] ^[to,ti+i] 



= yfk^M+iV ^'^^^^ = argmax,{<7^(o)(xz+i|y|^^J}. 



4. Set 0(0) = and yf^J,^, = yf,;;> 



So to propagate to the process on the interval (t;, (for / = 1, . . . , n — 1), we 
use 9^^^^^ which should roughly follow the distribution of ^Ixjip^^^], r/('^\ because 
of step 3. a). 



6 Examples 

6.1 Stochastic Oregonator 

First we consider the stochastic Oregonator to illustrate the algorithms. It is 
a highly idealized model of the Belousov-Zhabotinskii reactions, a non-linear 
chemical oscillator. It has 3 species and the following 5 reactions: 
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Ri 


y2 




R2 


yi+y2 _ 





Rs 













R5 


y3 





For further details, see laillesDiel (jl977l V Following Section EH the process 
{yt : t > to}, where yt = {Vt ^VuVt)'^ and yl is the number of species at time 
i, is a Markov jump process with standardized reaction intensities 



,3\T 



My) = (r,yy,y\y'(y -i)/2,r) 



and the jump matrix 







-1 1 
-1 
1 




1 



-1 



As starting distribution, we use the uniform distribution on {0, . . . , K}^ with 
K = 25. The measurement errors are normally distributed with precision ry, 
that is 



n 



exp 



(13) 



In Figure [H a sample trajectory for 6 = (0.1,0.1,2.5,0.04,1) and rj = 1/2, 
simulated with the Gillespie algorithm, is shown, observed every 0.5 units of 
time during a time period of 20. 

If we choose a Gamma(a;, /3) prior for rj, then the full conditional distribution 
of rj in the posterior is again a Gamma distribution with parameters 



"''(^^[to. 



tn] ' 



a) = " + e {1,. . . ,n} X {1,. . . ,r} : xj / na} 



and 



This yields a simple way to perform step 3. in Algorithm 13.11 

We now want to estimate the parameters and the Markov jump process from 
the observations at the discrete times T = {0,0.5, . . . ,20} given in Figured] in 
various scenarios. The total raction numbers for the true underlying Markov 
jump process are rtot = (76, 417, 518, 92, 508)"^. 

A) Exact observation of every species, i.e., we observe {yt : t € T}. 

B) Observation of every species with errors, i.e., we observe {xt : t G T}. 

C) Observation of species Yi and Y2 with errors, i.e., we observe {{xl,Xf) : 
t e T}. 
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D) Observation of species Yi and with errors, i.e., we observe {{xl,x^) : 
t e T}. 

E) Observation of species Y2 and Y3 with errors, i.e., we observe : 
t e T}. 




2 4 6 8 10 12 14 16 18 20 

t 



Figure 1: Observations (squares), (circles) and x^ (triangles) of the Oreg- 
onator model for t E T. The true Markov jump process is indicated as solid 
line. 



6.1.1 Specifications of tiie algorithm 

We specify the proposal distributions and further details in our algorithm as 
follows. A basis vector matrix is given by 

T^/ 4^ / 1 -1 1 Y 
= ( 1 1 1 1 j 

and we simulate Z in ([6]) with (Si^i, ^2^2)"^, where P[Bi = ±1] = 0.5, Bi ~ 
Bin(2, l), L = 0.4 and all random numbers are independent. For the new reaction 
number at the beginning on the interval [tn-i,tn] or at the end on the interval 
[tn-i,tn], we want updates which change only one component of ytg or yt^, 
respectively, to get better acceptance. In order to do this, we need the integer 
solutions to 

A.,,Xr?r - not) = 

for j £ {1,2,3}, where j4_j^. denotes the reaction matrix without the j-th 
row. With the techniques from Appendix [XI we find exemplarily for j = 1 the 
basis vectors vi = (1,-1,0,0,0)^, V2 = (0,0,0,1,0)^ and V3 = (0,1,1,0,1)'^. 
Because the last one is already in the kernel of A, we can restrict ourselves to 
vi and V2 for the update, i.e., we choose one of these or their additive inverses 
with equal probability and add it to the total reaction number to get the new 
one. We proceed analogously for j = 2 and j = 3. 

For the parameters 6, we use r(0.1, 1) priors. For the scenarios B) to E), r] 
is unknown. We use r] = 10 during the initialisation and a r(0, 0) improper 
prior afterwards so that tj can be updated with Gamma distributions. For the 
initialisation (Algorithmic]), we use M{'> and S'^'} around 100 to 200 and slight 
shrinking. 
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We use the standard set of sub-intervals described in Subsection 14.41 Without 
further tuning, we obtained average acceptance rates of 5% - 7% in all the sce- 
narios. The running time for the lOO'OOO iterations of Algorithm 13.11 together 
with the initi alisation (Algorithm [51) , code d in th e language for statistical com- 
puting R (see R Development Core Team ( 2O10l )). was about 38 hours on one 
core of a 2.814 GHz Dual-Core AMD Opteron(tm) Processor 2220 with 32'000 
MB RAM. A significant speed up is expected from coding in C. 



6.1.2 Results 




Figure 2: Traces for the parameters in scenario B for the Oregonator example 
with a thinning factor 10. The origin on the abscissa marks the last iteration 
of the initialisation (Algorithm [5]) . True values are indicated with a horizontal 
line. 

In Figure El we show the trace plots for the parameters {9, rj) exemplarily in 
scenario B. On the whole, mixing seems satisfactory, although not ideal for 
some parameters. In addition, we can see that the initialisation process yields 
starting values which are already very close to the true values. 

In Figure [31 we compare the posterior densities estimated from the last SO'OOO 
of lOO'OOO iterations of Algorithm 13.11 in the different scenarios. The vertical 
dotted line indicates the true values of 6 and rj, respectively. We find that in all 
the scenarios the true values of 9 are in regions where the posterior density is 
high. In the cases where some component is not observable, the uncertainty is 
bigger, especially for the reaction rates corresponding to standardized transition 
intensities which depend on this component. For example in scenario E, is 
not observed, leading to a loss in terms of precision for reactions rates 02, O3 
and 64. The posterior densities of rj seem rather spread out, but the mode is 
found nicely. 

Figure [H displays estimates and point- wise 95% confidence bands of the latent 
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Figure 3: Posterior densities of the parameters 9 and r/ for the Oregonator model 
in the scenarios A (thick-sohd) , B (thin-sohd), C (dashed), D (dotted) and E 
(dot-dashed), estimated from the last SO'OOO of lOO'OOO iterations of Algorithm 
13. 1[ True values are indicated with a vertical dotted line. 



components in the process for the scenarios C to E. For comparison, we also 
indicate the true values of the latent component with a thin line. We can see 
that they nicely lie within our confidence bands. 



6.2 Prokaryotic auto-regulation 



We look at the simplified mode l for prokaryotic auto -regulation introduced in 



Golightlv and WilkinsonI (|2005l ^ and reconsidered in iGolightlv and WilkinsonI 



(|2009l ). It is described by the following set of 8 chemical reactions. 



Ri 


DNA + P2 - 


DNAP2 


R2 


DNAP2 


DNA + P2 


Rs 


DNA 


-)■ DNA RNA 




RNA 


RNA + P 




2P 


P2 


Rq 


P2 


2P 


Rj 


RNA 





Rs 


P 






In this system, the sum DNAP2 + DNA remains constant, and we assume that 
this constant K is known and equal to 10 in our simulation. Therefore it is 
enough to consider the four species y = (y^, y^, y^, y^)^ = (RNA, P, P2, DNA)^, 
where RNA, P, P2 and DNA are now interpreted as numbers of the correspond- 
ing species. According to the mass action law, the standardized transition 
intensities are 

h{y) = (DNA X P2,K- DNA,DNA,RNA,P x (P - l)/2, P2, RNA, P)^ 
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Figure 4: Estimates (thick solid lines) and point-wise 95% confidence bands 
(indicated by the dashed lines) of the latent components for the Oregonator 
model in the scenarios C (top), D (middle) and E (bottom), respectively, esti- 
mated from the last 50'000 iterations of Algorithm 13. H thinned by a factor of 
10. The true values are shown as thin line. 



and the jump matrix is given by 



A :-- 



/OOlOO 0-1 0\ 

1 -2 2 -1 

-110 1-10 

V -1 1000 0/ 



As start distribution, we assume that the number of DNA molecules is uni- 
formly distributed on {0, . . . , Ky and the other species are initially 0. Following 
Golightlv and WilkinsonI jiS), we again use normally distributed measure- 
ment errors, see (jl3p . The update for r/ (step 4. in Algorithm 13. ip . can then be 
done using Gamma distributions. Also we consider three scenarios in a similar 
manner to the last example. 

A) Exact observation of every species. 

B) Observation of every species with errors. 

C) Observation of species RNA, P and P2 with errors. 

True values of the parameters are 6 = (0.1,0.7,0.6,0.085,0.05,0.2,0.2,0.015), 
r] = 0.5 and we observe the process at the integer times 0, . . . , 50. The to- 
tal raction numbers for the true underlying Markov jump process are rtot = 
(192, 190, 122, 53, 116, 99, 117, 7)^. 
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6.2.1 Transformation of parameters 



As re ported in iGolightlv and Wilkinsonl (|2005l ^ and iGoligiitlv and Wilkinsonl 



(|2009h . ratios of the parameters ^1/^2 and 9^/0^, connected to the reversible 



reaction pairs R2 and -R5, i?6) respectively, are more precise than the in- 
dividual rates. We found a similar behavior also for and 64/ Og. This is 
related to the fact that adding or subtracting an equal number of the corre- 
sponding reaction between two consecutive observation times does not change 
the values of the Markov jump chain at these time points, making it rather 
difficult to tell how many of these reaction events should be there from discrete 
observations only. This implies also that there is a strong positive dependence in 
the posterior between these pairs of parameters, slowing down the convergence 
of the algorithm. 

It is therefore much better to use the following reparameterization 

Pl=Oi+ 02, P3 = 03 + Oj, P5 = 04 + 08, P7 = 05 + 06 

and 

01 03 04: 05 

P2 = n T-^T' Pi = n —?r^ Pe = n —^T^ PS 



1 + 02 03 + 07 04 + 08 05 + 06 

We use r(a, /3) priors with a = 0.1 and /? = 1 for pi {I = 1, 3, 5, 7) and B(l, 1) 
priors, i.e., uniform priors on [0,1], for {k = 2,4,6,8). For updating e.g. 
(pi,P2), we factor the joint density of {pi,P2) given y[f(,,t„] as p{pi\p2)p{p2)- 
Then p{pi\p2) is a r(a + r^ii + rf^^, /3 + P2I1 + (1 - P2)h)) density, and 

p{p2) « (/? + P2h + (1 - p2)/2)-("+^^+^^V^Hl - /'2)^^ 

with h = El=i^' hiiyk-i)dk and h = EfeLI^' ^2(yfc-i)(^fc. The factor (/? + 
P2I1 + (1 — P2)l2)~^°'~^^^~^^^^ can be approximated with piecewise linear upper 
bounds, so we can simulate from p{p2) using an adaptive accept-reject-method 
with mixtures of truncated Beta distributions as proposals. 



6.2.2 Specifications of the algorithm 

The basis vector matrix is given by 

/llOOOOOOX'^ 
Y(A\ _ 00001100 
^ ~ 1 1 

yoooioooi/ 

and for qf we choose qfi+Ci) = 0.1 for i G {1, 2, 3, 4} and gf (0) = 0.2 (see ©). 

To get the new total reaction number for the update at the beginning, i.e., on 
the interval [to,ii], we have to respect that yj^ = y^^ = yf^ = 0. We therefore 
only want to change the fourth component of ytg- So 

^-4,.(rS,r(y[to,ti]) - rtot{y[toM)) = 0- 
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With the techniques from Appendix]^ we find the same basis vectors as in V{A) 
plus the vector = (0,-1,0,2,1,0,0,0)^. So we use with ^^(±^5) = 0.5. 
Updating the total reaction number at the end on the interval t„] is done 

as in the previous example. 



6.2.3 Results 



T I I I I I I I I I r 

50 150 250 350 450 
Iteration/1 00 



CO ^ : 



"I — I — I — I — I — I — I — I — I — I — r 

50 150 250 350 450 
Iteration/100 




T 1 — r 

150 250 350 
Iteration/1 00 



1 I I I I I I I I I r 

50 150 250 350 450 
Iteration/1 00 



1 1 1 1 1 1 1 1 1 — r 

50 150 250 350 450 
Iteration/1 00 



1 — I — I — I — I — I — I — I — I — I — r 

50 150 250 350 450 
Iteration/1 00 



1 — I — I — I — I — I — I — I — I — I — r 

50 150 250 350 450 
Iteration/1 00 



~i — I — I — I — I — I — I — I — I — I — r 

50 150 250 350 450 
Iteration/1 00 



Figure 5: Traces for the parameters for the prokaryotic auto-regulation model 
in scenario A with a thinning factor 100. The origin on the abscissa marks the 
last iteration of the initialisation (Algorithm [5]) . True values are indicated with 
a horizontal line. 



The running time for the SO'OOO iterations of Algorithm 13.11 together with the 
initi alisation (Algorithm [SD , cod e d in t he language for statistical computing R 



(see R Development Core Team ( 2O10l )). was about 16 hours on on one core 



of a 2.814 GHz Dual-Core AMD Opteron(tm) Processor 2220 with 32'000 MB 
RAM. Average acceptance rates were around 15% - 25%. 

In Figure El we show the trace plots of the initialisation and 50'000 iterations of 
Algorithm 13. II for the scenario A. For the parameter pi, the mixing is somewhat 
problematic. 

In Figure [6l we can see, as expected, that the posterior densities of p2, Pi, pe 
and ps are far more concentrated than those of pi, p3, ps and py. Nevertheless, 
all posterior densities go well with the true values. When the number of DNA 
molecules is not observed (scenario C), the posteriors of pi and and ps are 
spread out, so estimating them in this scenario seems rather hard. 

Finally, we show estimate and the point-wise 95% confidence band of the latent 
component in scenario C, that is the number of DNA molecules, in Figure \7\ 
The results contain the true values quite nicely. 
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0.0 0.2 0.4 0.6 0.8 1.0 
rho2 




Figure 6: Posterior densities of the parameters p and r] for the prokaryotic auto- 
regulation model in the scenarios A (thick-solid), B (thin-solid) and C (dashed), 
estimated from the last 40'000 of SO'OOO iterations (thinned with factor 100) of 
Algorithm 13.11 True values are indicated with a vertical dotted line. 



7 Conclusions 



In this paper, we have presented a technique to infer rate constants and latent 
process components of Markov jump processes from time series data using fully 
Bayesian inference and Markov chain Monte Carlo algorithms. We have used 
a new proposal for the Markov jump process and - exploiting the general state 
space framework - a filter type initialisation algorithm to render the problem 
computationally more tractable. Even in very data-poor scenarios in our exam- 
ples, e.g. one species is completely unobserved, we have been able to estimate 
parameter values and processes and the true values are contained in posterior 
confidence bands. 

The techniques are generic to a certain extend, but as our examples have shown, 
they have to be adapted to the situation at hand, which makes their blind appli- 
cation rather difficult. Clearly, the speed of our algorithm scales with the num- 
ber of jump events, so they are less suitable in situations with many jumps. In 
such a situation, using the diffusion approximation is recommended. However, 
we believe that the statement "It seems unlikely that fully Bayesian inferential 
techniques of practical value can be developed based on the original Markov 
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Figure 7: Estimates (thick solid lines) and 95% confidence intervals (indicated 
by the dashed lines) of the latent components in the scenario C (prokaryotic 
auto-regulation model), estimated from the last 40'000 iterations of Algorithm 
13.11 thinned by a factor of 10. True values are shown as thin line. 



jump process formulation of stochastic kinetic model s, at least given currently 
available computing hardware" in the introduction of iGolightlv and Wilkinson 
is too pessimistic. 



A Integer solutions of Homogeneous Linear Equa- 
tions 

Let A £ Mpxr(^) be an integer p x r matrix. We want to determine the set 

h = {xer' ■.Ax = 0}. (14) 

Obviously, it is enough to consider only linear independent rows of A, so we 
assume rank{A) = p < r. The case p = r is then trivial, so p < r. The main 
idea is to transform the matrix A into the so called Hermite normal form. For 



the following, see I Jaeerl (j200ll ). 

Definition A.l (Hermite normal form). A matrix H G Mpxr(^) with rank s 
is in Hermite normal form if 

1. . . . ,is with 1 < ii < ■ ■ ■ < is ^ P with Hi.j G ^\{0} for 1 < j < s. 

2. Hij = for I < i < ij - 1, 1 < j < s. 

3. The columns s + 1 to r are 0. 

I [Hi^^i/H,^j\ =0 forl<l<j <s. 
Proposition A.l. For every A G Mpxr(^) exists a unique unimodular matrix 
U (U € GLr{1) := {B G M^xr(Z) : det{B) = ±1}), such that H = AU is in 
Hermite normal form. 

There exist many algorithms to calculate H and U, see e.g. iJage 2 (jiooil l. 



The Hermite normal form allows us to determine the set (|14p . Because A 
is assumed to have maximal rank, by definition H = (-8,0), where B is an 
invertible, lower triangular p x p matrix. For y = U~^x we have the equation 

= Ax = AUy = {B,0)y, 



19 



so y has zeroes in the first p positions and arbitrary integers in the remaining 
positions. Hence a basis vector matrix V for ([HD is given by Vi = u ^+j . If 
necessary, one can reduce the length of the Vi by the algorithm 2.3 in iRiplev 
(fl987.1. 
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